Skip to content

Conversation

mizvekov
Copy link
Contributor

@mizvekov mizvekov commented Sep 18, 2025

This makes sure the tuple sizes remain within implementation limits, and this doesn't cause the compiler to crash later, as the tuple size is assumed to fit within an UnsignedOrNone.

Fixes #159563

@mizvekov mizvekov self-assigned this Sep 18, 2025
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2025

@llvm/pr-subscribers-clang

Author: Matheus Izvekov (mizvekov)

Changes

Fixes #159563


Full diff: https://github.com/llvm/llvm-project/pull/159677.diff

4 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+4-2)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+3)
  • (modified) clang/lib/Sema/SemaDeclCXX.cpp (+10)
  • (modified) clang/test/SemaCXX/builtin-structured-binding-size.cpp (+18-2)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index c898784b3f93e..ec14594761f3c 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -262,6 +262,8 @@ Improvements to Clang's diagnostics
   Moved the warning for a missing (though implied) attribute on a redeclaration into this group.
   Added a new warning in this group for the case where the attribute is missing/implicit on
   an override of a virtual method.
+- Implemented diagnostics when retrieving the tuple size for types where its specialization of `std::tuple_size`
+  produces an invalid size (either negative or greater than the implementation limit). (#GH159563)
 - Fixed fix-it hint for fold expressions. Clang now correctly places the suggested right
   parenthesis when diagnosing malformed fold expressions. (#GH151787)
 - Added fix-it hint for when scoped enumerations require explicit conversions for binary operations. (#GH24265)
@@ -347,8 +349,8 @@ Bug Fixes in This Version
   and vector of 4 ``float`` values. (#GH155405)
 - Fixed inconsistent shadow warnings for lambda capture of structured bindings.
   Previously, ``[val = val]`` (regular parameter) produced no warnings with ``-Wshadow``
-  while ``[a = a]`` (where ``a`` is from ``auto [a, b] = std::make_pair(1, 2)``) 
-  incorrectly produced warnings. Both cases now consistently show no warnings with 
+  while ``[a = a]`` (where ``a`` is from ``auto [a, b] = std::make_pair(1, 2)``)
+  incorrectly produced warnings. Both cases now consistently show no warnings with
   ``-Wshadow`` and show uncaptured-local warnings with ``-Wshadow-all``. (#GH68605)
 - Fixed a failed assertion with a negative limit parameter value inside of
   ``__has_embed``. (#GH157842)
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 8b862ae47af89..42608debe8609 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -638,6 +638,9 @@ def err_decomp_decl_std_tuple_element_not_specialized : Error<
 def err_decomp_decl_std_tuple_size_not_constant : Error<
   "cannot decompose this type; 'std::tuple_size<%0>::value' "
   "is not a valid integral constant expression">;
+def err_decomp_decl_std_tuple_size_invalid
+    : Error<"cannot decompose this type; 'std::tuple_size<%0>::value' "
+            "is not a valid size: %1">;
 def note_in_binding_decl_init : Note<
   "in implicit initialization of binding declaration %0">;
 def err_arg_is_not_destructurable : Error<
diff --git a/clang/lib/Sema/SemaDeclCXX.cpp b/clang/lib/Sema/SemaDeclCXX.cpp
index fb57b43882911..e133022426cff 100644
--- a/clang/lib/Sema/SemaDeclCXX.cpp
+++ b/clang/lib/Sema/SemaDeclCXX.cpp
@@ -1222,6 +1222,16 @@ static IsTupleLike isTupleLike(Sema &S, SourceLocation Loc, QualType T,
   if (E.isInvalid())
     return IsTupleLike::Error;
 
+  if (Size < 0 || Size >= UINT_MAX) {
+    llvm::SmallVector<char, 16> Str;
+    Size.toString(Str);
+    S.Diag(Loc, diag::err_decomp_decl_std_tuple_size_invalid)
+        << printTemplateArgs(S.Context.getPrintingPolicy(), Args,
+                             /*Params=*/nullptr)
+        << StringRef(Str.data(), Str.size());
+    return IsTupleLike::Error;
+  }
+
   return IsTupleLike::TupleLike;
 }
 
diff --git a/clang/test/SemaCXX/builtin-structured-binding-size.cpp b/clang/test/SemaCXX/builtin-structured-binding-size.cpp
index 53576048754ab..bcd13a6fb720d 100644
--- a/clang/test/SemaCXX/builtin-structured-binding-size.cpp
+++ b/clang/test/SemaCXX/builtin-structured-binding-size.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_cc1 %s -std=c++2c -fsyntax-only -verify
-// RUN: %clang_cc1 %s -std=c++2c -fsyntax-only -verify -fexperimental-new-constant-interpreter
+// RUN: %clang_cc1 %s -triple=x86_64 -std=c++2c -fsyntax-only -verify
+// RUN: %clang_cc1 %s -triple=x86_64 -std=c++2c -fsyntax-only -verify -fexperimental-new-constant-interpreter
 
 
 struct S0 {};
@@ -229,3 +229,19 @@ static_assert(__is_same_as(tag_of_t<S1>, int));
 static_assert(__is_same_as(tag_of_t<int>, int)); // error
 // expected-error@-1 {{constraints not satisfied for alias template 'tag_of_t' [with T = int]}}
 // expected-note@#tag-of-constr {{because substituted constraint expression is ill-formed: type 'int' cannot be decomposed}}
+
+struct MinusOne;
+template <> struct ::std::tuple_size<MinusOne> {
+  static constexpr int value = -1;
+};
+int minus_one = __builtin_structured_binding_size(MinusOne);
+// expected-error@-1 {{cannot decompose this type; 'std::tuple_size<MinusOne>::value' is not a valid size: -1}}
+// expected-error@-2 {{type 'MinusOne' cannot be decomposed}}
+
+struct UintMax;
+template <> struct ::std::tuple_size<UintMax> {
+  static constexpr unsigned value = -1;
+};
+int uint_max = __builtin_structured_binding_size(UintMax);
+// expected-error@-1 {{cannot decompose this type; 'std::tuple_size<UintMax>::value' is not a valid size: 4294967295}}
+// expected-error@-2 {{type 'UintMax' cannot be decomposed}}

@mizvekov mizvekov requested a review from shafik September 18, 2025 23:33
Copy link
Collaborator

@shafik shafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a more complete description of the actual underlying problem something along the lines of what I added here: #159579 (comment)

It took me and others a bit to get the original problem and so the description is valuable.

Also this is important documentation for the git log and future devs.


if (Size < 0 || Size >= UINT_MAX) {
llvm::SmallVector<char, 16> Str;
Size.toString(Str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't format the value and it's lower than UINT_MAX anyway, just calling getExtValue() should work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean, here the value can be arbitrarily larger than UINT_MAX, so calling getExtValue can truncate them.

if (E.isInvalid())
return IsTupleLike::Error;

if (Size < 0 || Size >= UINT_MAX) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this UINT_MAX but the calling code uses Ctx.getSizeType()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't seem all that right. I think we should be checking if (Size.isNegative() || !Size.isSignedIntN(Ctx.getTypeSize(Ctx.getSizeType())) or something like that like that ?

(should that just be isIntN?).

host-compiler UINT_MAX is almost definitely the wrong thing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host compiler uintmax is the right thing, because we pass this down on an UnsignedOrNone, which uses unsigned as the representation, so the implementation limit is uintmax-1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! sure, yeah, that is reasonable.

@cor3ntin
Copy link
Contributor

@tbaederr @mizvekov Is the plan for Tim's patch to be backported and for this one to be in 22? This would make sense.
I reported a core issue here https://cplusplus.github.io/CWG/issues/3071.html


if (Size < 0 || Size >= UINT_MAX) {
llvm::SmallVector<char, 16> Str;
Size.toString(Str);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the separately formatting? ADT/APInt.h already ahs an operator <<, so we should be able to just << Size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a raw_ostream though. We usually pass toString(Size, 10) or similar.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hrmph... It seems to me that StreamingDiagnostic should have something that defers to raw_ostream but I see we don't have one. That is unfortunate.

if (E.isInvalid())
return IsTupleLike::Error;

if (Size < 0 || Size >= UINT_MAX) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't seem all that right. I think we should be checking if (Size.isNegative() || !Size.isSignedIntN(Ctx.getTypeSize(Ctx.getSizeType())) or something like that like that ?

(should that just be isIntN?).

host-compiler UINT_MAX is almost definitely the wrong thing here.

@mizvekov
Copy link
Contributor Author

@tbaederr @mizvekov Is the plan for Tim's patch to be backported and for this one to be in 22? This would make sense. I reported a core issue here https://cplusplus.github.io/CWG/issues/3071.html

I mean do we need to use the other patch at all? Why not just backport this one?

Note that besides not producing a diagnostic with an explanation for the negative case, it also allows values larger than the implementation limit, which will cause wrap around in the compiler, potentially leading to compiler UB anyway.

@mizvekov mizvekov force-pushed the users/mizvekov/GH159563 branch from 94efaa9 to 2b8e0e6 Compare September 19, 2025 17:54
@zyn0217
Copy link
Contributor

zyn0217 commented Sep 19, 2025

I mean do we need to use the other patch at all? Why not just backport this one?

I remembered adding a new diagnostic in a backport patch could cause an ABI break, since we can somehow change the enumeration order in the tablegen files.

For example, #139396 (review)

@mizvekov mizvekov force-pushed the users/mizvekov/GH159563 branch from 2b8e0e6 to 97daa1e Compare September 19, 2025 18:21
@erichkeane
Copy link
Collaborator

I mean do we need to use the other patch at all? Why not just backport this one?

I remembered adding a new diagnostic in a backport patch could cause an ABI break, since we can somehow change the enumeration order in the tablegen files.

For example, #139396 (review)

Yeah, we can't backport new diagnostics unfortunately.

@mizvekov mizvekov merged commit 46ce6a0 into main Sep 19, 2025
8 of 10 checks passed
@mizvekov mizvekov deleted the users/mizvekov/GH159563 branch September 19, 2025 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[clang] Assertion `operator bool()' failed.

7 participants